Cyclone: Type-Checking Regions

Type-Checking Regions

Because of recursive functions, there can be any number of live regions at run time. The compiler uses the following general strategy to ensure that only pointers into live regions are dereferenced:

Use compile-time region names. Syntactically these are just type variables, but they are used differently.
Decorate each pointer type and handle type with one or more region names.
Decorate each program point with a (finite) set of region names. We call the set the point’s capability.
To dereference a pointer (via *, ->, or subscript), each region name that appears in the pointer’s type name must be in the program point’s capability. Similarly, to use a handle for allocation, the handle type’s region name must be in the capability.
Enforce a type system such that the following is impossible: A program point P’s capability contains a region name `r that decorates a pointer (or handle) expression expr that, at run time, points into a region that has been deallocated and the operation at P dereferences expr.

This strategy is probably too vague to make sense at this point, but it may help to refer back to it as we explain specific aspects of the type system.

Note that in the rest of the documentation (and in common parlance) we abuse the word “region” to refer both to region names and to run-time collections of objects. Similarly, we confuse a block of declarations, its region-name, and the run-time space allocated for the block. (With loops and recursive functions, “the space allocated” for the block is really any number of distinct regions.) But in the rest of this section, we painstakingly distinguish region names, regions, etc.

Region Names

Given a function, we associate a distinct region name with each program point that creates a region, as follows:

If a block (blocks create stack regions) has label L, then the region-name for the block is `L.
If a block has no label, the compiler makes up a fresh region-name for the block.
In region r <`foo> s, the region-name for the construct is `foo.
In region r s, the region-name for the construct is `r.
In region h = open(k) s, the region-name for the construct is `r, assuming k has type region_key_t<`r,_> (where _ is some other region name of no consequence).

The region name for the heap is `H. Region names associated with program points within a function should be distinct from each other, distinct from any region names appearing in the function’s prototype, and should not be `H. (So you cannot use H as a label name, for example.) Because the function’s return type cannot mention a region name for a block or region-construct in the function, it is impossible to return a pointer to deallocated storage.

In region r <`r> s, region r s, and region r = open(k) s the type of r is region_t<`r> (assuming, that k has type region_key_t<`r,_>). In other words, the handle is decorated with the region name for the construct. Pointer types’ region names are explicit, although you generally rely on inference to put in the correct one for you.

Capabilities

In the absence of explicit effects (see below), the capability for a program point includes exactly:

`H
The effect corresponding to the function’s prototype. Briefly, any region name in the prototype (or inserted by the compiler due to an omission) is in the corresponding effect. Furthermore, for each type variable `a that appears (or is inserted), “regions(`a)” is in the corresponding effect. This latter effect roughly means, “I don’t know what `a is, but if you instantiate with a type mentioning some regions, then add those regions to the effect of the instantiated prototype.” This is necessary for safely type-checking calls that include function pointers.
The region names for the blocks and “region r s” statements that contain the program point

For each dereference or allocation operation, we simply check that the region name for the type of the object is in the capability. It takes extremely tricky code (such as existential region names) to make the check fail.

Type Declarations

Each pointer and region handle type is decorated with a set of region names. This set of region names is referred to as the store or effect qualifier of a type. For instance a pointer type might be int *`r+`H. This indicates the type of pointer to an integer that resides in the region named `r or the heap region, `H. Similarly, a region handle type might be region_t<`r1+`r2> which indicates a handle to region named `r1 or `r2.

A struct, typedef, or datatype declaration may be parameterized by any number of effect qualifiers. The store qualifiers are placed in the list of type parameters. They may be followed by their kind; i.e. “::E”. For example, given

  struct List<`a,`r>{`a hd; struct List<`a,`r> *`r tl;};

the type struct List<int,`H> is for a list of ints in the heap, while the type struct List<int,`l> is for a list of ints in some lexical region. Notice that all of the “cons cells” of the List will be in the same region (the type of the tl field uses the same region name `r that is used to instantiate the recursive instance of struct List<`a,`r>). However, we could instantiate `a with a pointer type that has a different region name.

Subtyping and Effect Qualifiers

A pointer type’s effect qualifier is part of the type. If e1 and e2 are pointers, then e1 = e2 is well-typed only if all the regions names mentioned in the the effect qualifier for e2’s type also appears in the effect qualifier of e1’s type. For instance, both assignments to b below are legal, while the assignment to a is not.

  void foo(int *`r a) {
    int *`r+`H b = a;
    if(!a) b = new 1;
    a = b;
  }

The store qualifier in the type of b indicates that it can point into the region named `r or into the heap region. Therefore, initializing b with a pointer into `r is certainly consistent with its store qualifier. Similarly, the second assignment to b is legal since it is updated to point to the heap region. However, the assignment to a is not permitted since the declared type of a claims that it is pointer into the region named `r alone and b may actually point to the heap.

For handles, if `r is a region name, there is at most one value of type region_t<`r> (there are 0 if `r is a block’s name), so there is little use in creating variables of type region_t<`r>. However, it could be useful to create variables of type region_t<`r1+`r2>. This is the type of handle to either the region named `r1 or to the region named `r2. This is illustrated by the following code:

  void foo(int a, region_t<`r> h) {
    region_t<`r+`H> rh = h;
    if(a) {
      rh = heap_region;
    }
  }

As always, this form of subtyping for effect qualifiers is not permitted under a reference. Thus, the assignment in the program below is not legal.

  void foo(int *`H *`r a) {
    int *`r+`H *`r b = a;
  }

Instead of the shorthand int *`r1+`r2 the more verbose int *@effect(`r1 + `r2) may also be used. The @effect qualifier can be used in place of the @region qualifier.

Function Calls

If a function parameter or result has type int *`r or region_t<`r>, the function is polymorphic over the region name `r. That is, the caller can instantiate `r with any region in the caller’s current capability as long as the region has the correct kind. This instantiation is usually implicit, so the caller just calls the function and the compiler uses the types of the actual arguments to infer the instantiation of the region names (just like it infers the instantiation of type variables).

The callee is checked knowing nothing about `r except that it is in its capability (plus whatever can be determined from explicit outlives assumptions), and that it has the given kind. For example, it will be impossible to assign a parameter of type int*`r to a global variable. Why? Because the global would have to have a type that allowed it to point into any region. There is no such type because we could never safely follow such a pointer (since it could point into a deallocated region).

Explicit and Default Effects

If you are not using existential types, you now know everything you need to know about Cyclone regions and memory management. Even if you are using these types and functions over them (such as the closure library in the Cyclone library), you probably don’t need to know much more than “provide a region that the hidden types outlive”.

The problem with existential types is that when you “unpack” the type, you no longer know that the regions into which the fields point are allocated. We are sound because the corresponding region names are not in the capability, but this makes the fields unusable. To make them usable, we do not hide the capability needed to use them. Instead, we use a region bound that is not existentially bound.

If the contents of existential packages contain only heap pointers, then `H is a fine choice for a region bound.

These issues are discussed in Advanced Features.